175 research outputs found
Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm
In this paper, we study a method to sample from a target distribution
over having a positive density with respect to the Lebesgue
measure, known up to a normalisation factor. This method is based on the Euler
discretization of the overdamped Langevin stochastic differential equation
associated with . For both constant and decreasing step sizes in the Euler
discretization, we obtain non-asymptotic bounds for the convergence to the
target distribution in total variation distance. A particular attention
is paid to the dependency on the dimension , to demonstrate the
applicability of this method in the high dimensional setting. These bounds
improve and extend the results of (Dalalyan 2014)
High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm
We consider in this paper the problem of sampling a high-dimensional
probability distribution having a density with respect to the Lebesgue
measure on , known up to a normalization constant . Such problem naturally occurs for example in Bayesian inference and machine
learning. Under the assumption that is continuously differentiable, is globally Lipschitz and is strongly convex, we obtain non-asymptotic
bounds for the convergence to stationarity in Wasserstein distance of order
and total variation distance of the sampling method based on the Euler
discretization of the Langevin stochastic differential equation, for both
constant and decreasing step sizes. The dependence on the dimension of the
state space of these bounds is explicit. The convergence of an appropriately
weighted empirical measure is also investigated and bounds for the mean square
error and exponential deviation inequality are reported for functions which are
measurable and bounded. An illustration to Bayesian inference for binary
regression is presented to support our claims.Comment: Supplementary material available at
https://hal.inria.fr/hal-01176084/. arXiv admin note: substantial text
overlap with arXiv:1507.0502
Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains
We consider the minimization of an objective function given access to
unbiased estimates of its gradient through stochastic gradient descent (SGD)
with constant step-size. While the detailed analysis was only performed for
quadratic functions, we provide an explicit asymptotic expansion of the moments
of the averaged SGD iterates that outlines the dependence on initial
conditions, the effect of noise and the step-size, as well as the lack of
convergence in the general (non-quadratic) case. For this analysis, we bring
tools from Markov chain theory into the analysis of stochastic gradient. We
then show that Richardson-Romberg extrapolation may be used to get closer to
the global optimum and we show empirical improvements of the new extrapolation
scheme
Analysis of Langevin Monte Carlo via convex optimization
In this paper, we provide new insights on the Unadjusted Langevin Algorithm.
We show that this method can be formulated as a first order optimization
algorithm of an objective functional defined on the Wasserstein space of order
. Using this interpretation and techniques borrowed from convex
optimization, we give a non-asymptotic analysis of this method to sample from
logconcave smooth target distribution on . Based on this
interpretation, we propose two new methods for sampling from a non-smooth
target distribution, which we analyze as well. Besides, these new algorithms
are natural extensions of the Stochastic Gradient Langevin Dynamics (SGLD)
algorithm, which is a popular extension of the Unadjusted Langevin Algorithm.
Similar to SGLD, they only rely on approximations of the gradient of the target
log density and can be used for large-scale Bayesian inference
Copula-like Variational Inference
This paper considers a new family of variational distributions motivated by
Sklar's theorem. This family is based on new copula-like densities on the
hypercube with non-uniform marginals which can be sampled efficiently, i.e.
with a complexity linear in the dimension of state space. Then, the proposed
variational densities that we suggest can be seen as arising from these
copula-like densities used as base distributions on the hypercube with Gaussian
quantile functions and sparse rotation matrices as normalizing flows. The
latter correspond to a rotation of the marginals with complexity . We provide some empirical evidence that such a variational family can
also approximate non-Gaussian posteriors and can be beneficial compared to
Gaussian approximations. Our method performs largely comparably to
state-of-the-art variational approximations on standard regression and
classification benchmarks for Bayesian Neural Networks.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019), Vancouver, Canad
Sampling from a log-concave distribution with compact support with proximal Langevin Monte Carlo
This paper presents a detailed theoretical analysis of the Langevin Monte
Carlo sampling algorithm recently introduced in Durmus et al. (Efficient
Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets
Moreau, 2016) when applied to log-concave probability distributions that are
restricted to a convex body . This method relies on a
regularisation procedure involving the Moreau-Yosida envelope of the indicator
function associated with . Explicit convergence bounds in total
variation norm and in Wasserstein distance of order are established. In
particular, we show that the complexity of this algorithm given a first order
oracle is polynomial in the dimension of the state space. Finally, some
numerical experiments are presented to compare our method with competing MCMC
approaches from the literature
- …